Skip to main content

1.3 - Our Solution - The Gym Environment Wrapper

To resolve the async vs. sync conflict, we will architect a solution using a well-established software design pattern: an Adapter. We will create a custom Python class that acts as a bridge, presenting a standard, synchronous gymnasium interface to the RL agent, while internally managing the complex, asynchronous python-sc2 game process.

This wrapper is the cornerstone of our entire RL implementation.

The gymnasium.Env API Contract

stable-baselines3 requires its environments to adhere to the gymnasium.Env API. Our wrapper must implement this contract precisely.

Method SignatureRole in Our Wrapper
__init__(self, ...)Defines the action_space and observation_space. Creates the communication queues.
reset(self)Episode Start. Terminates any old game process, launches a new one, waits for the initial observation, and returns it.
step(self, action)Core Interaction. Takes an action from the agent, puts it on the action queue, waits for the game to process it, retrieves the result from the observation queue, and returns it.
close(self)Cleanup. Ensures the python-sc2 game process is properly terminated when training finishes.

The Implementation Plan

Our SC2GymEnv wrapper will be responsible for orchestrating the two processes. Our BotAI subclass will also need to be modified to listen for commands.

  • SC2GymEnv (The Wrapper - Process 1):

    • Inherit from gymnasium.Env.
    • Initialize multiprocessing.Queue for actions and observations.
    • Implement reset to manage the game process lifecycle.
    • Implement step to pass data between queues.
    • Implement close for clean shutdown.
  • RLBotAI (The Game Actor - Process 2):

    • Be initialized with the communication queues.
    • In on_step, get an action from the action queue (blocking).
    • Execute the action in the game world.
    • Calculate the observation and reward.
    • Put the (obs, reward, terminated, ...) tuple on the observation queue.

System Architecture Diagram

This diagram illustrates the complete, two-process architecture with our SC2GymEnv wrapper as the bridge.

            +--------------------------+      +---------------------------+      +---------------------------+
| stable-baselines3 | | SC2GymEnv Wrapper | | python-sc2 Game Proc. |
| (Training Loop) | | (Our Adapter) | | (Our RLBotAI) |
+--------------------------+ +---------------------------+ +---------------------------+
| | |
| calls env.step(action) | |
|------------------------------>| action_q.put(action) |
| |--------------------------------->| action = action_q.get()
| | |
| | | ...plays game step...
| | |
| | | new_obs, ... = ...
| | | obs_q.put(new_obs, ...)
| | obs, ... = obs_q.get() <---------|------------------------
| | |
| returns (obs, reward, ...) | |
|<------------------------------| |
| | |

Synchronous Process 1 Inter-Process Queues Asynchronous Process 2

With this design, the RL agent operates within its standard synchronous loop, completely abstracted from the asynchronous nature of the game it is learning to control. In the next chapter, we will implement this SC2GymEnv class.